AITopics | yvette graham

Collaborating Authors

yvette graham

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Audio-Based Crowd-Sourced Evaluation of Machine Translation Quality

Haq, Sami Ul, Castilho, Sheila, Graham, Yvette

arXiv.org Artificial IntelligenceSep-18-2025

Machine Translation (MT) has achieved remarkable performance, with growing interest in speech translation and multimodal approaches. However, despite these advancements, MT quality assessment remains largely text centric, typically relying on human experts who read and compare texts. Since many real-world MT applications (e.g Google Translate Voice Mode, iFLYTEK Translator) involve translation being spoken rather printed or read, a more natural way to assess translation quality would be through speech as opposed text-only evaluations. This study compares text-only and audio-based evaluations of 10 MT systems from the WMT General MT Shared Task, using crowd-sourced judgments collected via Amazon Mechanical Turk. We additionally, performed statistical significance testing and self-replication experiments to test reliability and consistency of audio-based approach. Crowd-sourced assessments based on audio yield rankings largely consistent with text only evaluations but, in some cases, identify significant differences between translation systems. We attribute this to speech richer, more natural modality and propose incorporating speech-based assessments into future MT evaluation frameworks.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2509.14023

Country:

Europe (1.00)
Asia (0.68)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.90)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date overview

Han, Lifeng, Gladkoff, Serge

arXiv.org Artificial IntelligenceAug-7-2025

Starting from the 1950s, Machine Translation (MT) was challenged by different scientific solutions, which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT). While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantage of a huge amount of parallel corpora available from the internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them. Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution. In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields, taking advantage of the large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding the sample size needed for human evaluation in real practice simulation. Full tutorial material is \textbf{available} to download at https://github.com/poethan/LREC22_MetaEval_Tutorial.

artificial intelligence, computational linguistic, natural language, (13 more...)

arXiv.org Artificial Intelligence

1605.04515

Country:

North America > United States > Colorado (0.14)
Europe > United Kingdom > England (0.14)

Genre:

Instructional Material (1.00)
Overview (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Findings of the First Workshop on Simulating Conversational Intelligence in Chat

Graham, Yvette, Qureshi, Mohammed Rameez, Khalid, Haider, Lampouras, Gerasimos, Iacobacci, Ignacio, Liu, Qun

arXiv.org Artificial IntelligenceFeb-9-2024

The aim of this workshop is to bring together experts working on open-domain dialogue research. In this speedily advancing research area many challenges still exist, such as learning information from conversations, engaging in realistic and convincing simulation of human intelligence and reasoning. SCI-CHAT follows previous workshops on open domain dialogue but with a focus on the simulation of intelligent conversation as judged in a live human evaluation. Models aim to include the ability to follow a challenging topic over a multi-turn conversation, while positing, refuting and reasoning over arguments. The workshop included both a research track and shared task. The main goal of this paper is to provide an overview of the shared task and a link to an additional paper that will include an in depth analysis of the shared task results following presentation at the workshop.

computational linguistic, evaluation, yvette graham, (11 more...)

arXiv.org Artificial Intelligence

2402.0642

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.16)
Europe > United Kingdom > England > Greater London > London (0.05)
Asia > China > Hong Kong (0.05)
(8 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.55)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.35)

Add feedback

Automatic Text Evaluation through the Lens of Wasserstein Barycenters

Colombo, Pierre, Staerman, Guillaume, Clavel, Chloe, Piantanida, Pablo

arXiv.org Artificial IntelligenceSep-9-2021

A new metric \texttt{BaryScore} to evaluate text generation based on deep contextualized embeddings e.g., BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on optimal transport tools, i.e., Wasserstein distance and barycenter. By modelling the layer output of deep contextualized embeddings as a probability distribution rather than by a vector embedding; this framework provides a natural way to aggregate the different outputs through the Wasserstein space topology. In addition, it provides theoretical grounds to our metric and offers an alternative to available solutions e.g., MoverScore and BertScore). Numerical evaluation is performed on four different tasks: machine translation, summarization, data2text generation and image captioning. Our results show that \texttt{BaryScore} outperforms other BERT based metrics and exhibits more consistent behaviour in particular for text summarization.

arxiv preprint arxiv, computational linguistic, proceedings, (11 more...)

arXiv.org Artificial Intelligence

2108.12463

Country:

Europe > Italy > Tuscany > Florence (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(15 more...)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

Khashabi, Daniel, Stanovsky, Gabriel, Bragg, Jonathan, Lourie, Nicholas, Kasai, Jungo, Choi, Yejin, Smith, Noah A., Weld, Daniel S.

arXiv.org Artificial IntelligenceJan-16-2021

Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited to tasks that can be reliably evaluated in an automatic manner. This work introduces GENIE, an extensible human evaluation leaderboard, which brings the ease of leaderboards to text generation tasks. GENIE automatically posts leaderboard submissions to crowdsourcing platforms asking human annotators to evaluate them on various axes (e.g., correctness, conciseness, fluency) and compares their answers to various automatic metrics. We introduce several datasets in English to GENIE, representing four core challenges in text generation: machine translation, summarization, commonsense reasoning, and machine comprehension. We provide formal granular evaluation metrics and identify areas for future research. We make GENIE publicly available and hope that it will spur progress in language generation models as well as their automatic and manual evaluation.

evaluation, machine translation, proc, (13 more...)

arXiv.org Artificial Intelligence

2101.06561

Country: